knitr document van Steensel lab
TF reporter barcode processing - pMT02 - stimulation 1
Introduction
18,000 TF reporters on pMT02 were transfected into mESCs and NPCs (in total 7 different conditions), sequencing data yielded barcode counts of these experiments. These counts will be processed in this script.
Analysis
Add barcode annotation to barcode counts & extract first bc read count information
Get a closer look at unmatched barcodes
Check if the pDNA-bc count correlates with the barcode count in the pDNA-insert-seq data
Conclusion barcode clustering:
- I manually added barcodes with high correlation and levenshtein distance of 1 to 1 barcode to get more reads
Compare differently clustered pDNA data
Data quality plots
Normalization of barcode counts:
Divide cDNA barcode counts through pDNA barcode counts to get activity
Calculate mean activity - filter out outlier barcodes
Calculate correlations between technical replicates
Data quality plots - correlation between replicates
## `geom_smooth()` using formula 'y ~ x'
## `geom_smooth()` using formula 'y ~ x'
## `geom_smooth()` using formula 'y ~ x'
Session Info
paste("Run time: ",format(Sys.time()-StartTime))## [1] "Run time: 4.314425 mins"
getwd()## [1] "/DATA/usr/m.trauernicht/projects/SuRE-TF/gen-1_stimulation-1"
date()## [1] "Tue Jul 13 17:18:04 2021"
sessionInfo()## R version 4.0.5 (2021-03-31)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 20.04.2 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/liblapack.so.3
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] stats4 parallel stats graphics grDevices utils datasets
## [8] methods base
##
## other attached packages:
## [1] pheatmap_1.0.12 PCAtools_2.2.0
## [3] ggrepel_0.9.1 DESeq2_1.30.1
## [5] SummarizedExperiment_1.20.0 Biobase_2.50.0
## [7] MatrixGenerics_1.2.1 matrixStats_0.59.0
## [9] GenomicRanges_1.42.0 GenomeInfoDb_1.26.7
## [11] IRanges_2.24.1 S4Vectors_0.28.1
## [13] BiocGenerics_0.36.1 tidyr_1.1.3
## [15] LncFinder_1.1.4 gridExtra_2.3
## [17] RColorBrewer_1.1-2 readr_1.4.0
## [19] haven_2.4.1 ggbeeswarm_0.6.0
## [21] plotly_4.9.4.1 tibble_3.1.2
## [23] dplyr_1.0.7 vwr_0.3.0
## [25] latticeExtra_0.6-29 lattice_0.20-41
## [27] stringdist_0.9.6.3 GGally_2.1.2
## [29] ggpubr_0.4.0 ggplot2_3.3.5
## [31] stringr_1.4.0 plyr_1.8.6
## [33] data.table_1.14.0
##
## loaded via a namespace (and not attached):
## [1] readxl_1.3.1 backports_1.2.1
## [3] lazyeval_0.2.2 splines_4.0.5
## [5] crosstalk_1.1.1 BiocParallel_1.24.1
## [7] digest_0.6.27 foreach_1.5.1
## [9] htmltools_0.5.1.1 fansi_0.5.0
## [11] magrittr_2.0.1 memoise_2.0.0
## [13] openxlsx_4.2.4 recipes_0.1.16
## [15] annotate_1.68.0 gower_0.2.2
## [17] jpeg_0.1-8.1 colorspace_2.0-2
## [19] blob_1.2.1 xfun_0.24
## [21] crayon_1.4.1 RCurl_1.98-1.3
## [23] jsonlite_1.7.2 genefilter_1.72.1
## [25] survival_3.2-10 iterators_1.0.13
## [27] glue_1.4.2 gtable_0.3.0
## [29] ipred_0.9-11 zlibbioc_1.36.0
## [31] XVector_0.30.0 seqinr_4.2-8
## [33] DelayedArray_0.16.3 BiocSingular_1.6.0
## [35] car_3.0-11 abind_1.4-5
## [37] scales_1.1.1 DBI_1.1.1
## [39] rstatix_0.7.0 Rcpp_1.0.7
## [41] viridisLite_0.4.0 xtable_1.8-4
## [43] dqrng_0.3.0 rsvd_1.0.5
## [45] foreign_0.8-81 bit_4.0.4
## [47] proxy_0.4-26 lava_1.6.9
## [49] prodlim_2019.11.13 htmlwidgets_1.5.3
## [51] httr_1.4.2 ellipsis_0.3.2
## [53] farver_2.1.0 pkgconfig_2.0.3
## [55] reshape_0.8.8 XML_3.99-0.6
## [57] nnet_7.3-15 locfit_1.5-9.4
## [59] utf8_1.2.1 caret_6.0-88
## [61] labeling_0.4.2 tidyselect_1.1.1
## [63] rlang_0.4.11 reshape2_1.4.4
## [65] AnnotationDbi_1.52.0 cachem_1.0.5
## [67] munsell_0.5.0 cellranger_1.1.0
## [69] tools_4.0.5 generics_0.1.0
## [71] RSQLite_2.2.7 ade4_1.7-17
## [73] broom_0.7.8 fastmap_1.1.0
## [75] evaluate_0.14 yaml_2.2.1
## [77] ModelMetrics_1.2.2.2 knitr_1.33
## [79] bit64_4.0.5 zip_2.2.0
## [81] purrr_0.3.4 sparseMatrixStats_1.2.1
## [83] nlme_3.1-152 compiler_4.0.5
## [85] beeswarm_0.4.0 curl_4.3.2
## [87] png_0.1-7 e1071_1.7-7
## [89] ggsignif_0.6.2 geneplotter_1.68.0
## [91] stringi_1.6.2 highr_0.9
## [93] forcats_0.5.1 Matrix_1.3-2
## [95] vctrs_0.3.8 pillar_1.6.1
## [97] lifecycle_1.0.0 irlba_2.3.3
## [99] cowplot_1.1.1 bitops_1.0-7
## [101] R6_2.5.0 rio_0.5.27
## [103] vipor_0.4.5 codetools_0.2-18
## [105] MASS_7.3-53.1 withr_2.4.2
## [107] GenomeInfoDbData_1.2.4 mgcv_1.8-34
## [109] hms_1.1.0 beachmat_2.6.4
## [111] grid_4.0.5 prettydoc_0.4.1
## [113] rpart_4.1-15 timeDate_3043.102
## [115] class_7.3-18 DelayedMatrixStats_1.12.3
## [117] rmarkdown_2.9 carData_3.0-4
## [119] pROC_1.17.0.1 lubridate_1.7.10